Overview

Dataset statistics

Number of variables10
Number of observations3276
Missing cells1434
Missing cells (%)4.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory256.1 KiB
Average record size in memory80.0 B

Variable types

Numeric9
Categorical1

Alerts

ph has 491 (15.0%) missing values Missing
Sulfate has 781 (23.8%) missing values Missing
Trihalomethanes has 162 (4.9%) missing values Missing
Hardness has unique values Unique
Solids has unique values Unique
Chloramines has unique values Unique
Conductivity has unique values Unique
Organic_carbon has unique values Unique
Turbidity has unique values Unique

Reproduction

Analysis started2021-10-24 07:12:31.096060
Analysis finished2021-10-24 07:12:41.822320
Duration10.73 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

ph
Real number (ℝ≥0)

MISSING

Distinct2785
Distinct (%)100.0%
Missing491
Missing (%)15.0%
Infinite0
Infinite (%)0.0%
Mean7.080794504
Minimum0
Maximum14
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size25.7 KiB
2021-10-24T09:12:41.879234image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4.487970742
Q16.093091914
median7.036752104
Q38.062066123
95-th percentile9.789818578
Maximum14
Range14
Interquartile range (IQR)1.968974209

Descriptive statistics

Standard deviation1.594319519
Coefficient of variation (CV)0.2251611055
Kurtosis0.7203155798
Mean7.080794504
Median Absolute Deviation (MAD)0.984116999
Skewness0.02563044758
Sum19720.01269
Variance2.541854728
MonotonicityNot monotonic
2021-10-24T09:12:42.003026image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.554096971
 
< 0.1%
6.5380840871
 
< 0.1%
5.915806751
 
< 0.1%
8.1364978691
 
< 0.1%
6.4937641751
 
< 0.1%
6.9774056331
 
< 0.1%
5.4892480551
 
< 0.1%
2.5581027991
 
< 0.1%
7.3121093041
 
< 0.1%
6.7044319131
 
< 0.1%
Other values (2775)2775
84.7%
(Missing)491
 
15.0%
ValueCountFrequency (%)
01
< 0.1%
0.227499051
< 0.1%
0.975577991
< 0.1%
0.9899122131
< 0.1%
1.4317815551
< 0.1%
1.7570371151
< 0.1%
1.8445383661
< 0.1%
1.9853833591
< 0.1%
2.1285314341
< 0.1%
2.3767680761
< 0.1%
ValueCountFrequency (%)
141
< 0.1%
13.541240241
< 0.1%
13.349888561
< 0.1%
13.175401721
< 0.1%
12.246928071
< 0.1%
11.907739831
< 0.1%
11.898078031
< 0.1%
11.621140131
< 0.1%
11.568767971
< 0.1%
11.563169061
< 0.1%

Hardness
Real number (ℝ≥0)

UNIQUE

Distinct3276
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean196.369496
Minimum47.432
Maximum323.124
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size25.7 KiB
2021-10-24T09:12:42.088393image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum47.432
5-th percentile141.7632808
Q1176.8505378
median196.9676269
Q3216.6674562
95-th percentile249.6097689
Maximum323.124
Range275.692
Interquartile range (IQR)39.81691838

Descriptive statistics

Standard deviation32.87976148
Coefficient of variation (CV)0.1674382332
Kurtosis0.6157716822
Mean196.369496
Median Absolute Deviation (MAD)19.84498915
Skewness-0.03934170473
Sum643306.469
Variance1081.078715
MonotonicityNot monotonic
2021-10-24T09:12:42.170450image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
204.89045551
 
< 0.1%
134.56027611
 
< 0.1%
170.19091231
 
< 0.1%
237.46109921
 
< 0.1%
171.23892551
 
< 0.1%
197.42819881
 
< 0.1%
195.74407411
 
< 0.1%
184.23185351
 
< 0.1%
187.87328351
 
< 0.1%
205.15056441
 
< 0.1%
Other values (3266)3266
99.7%
ValueCountFrequency (%)
47.4321
< 0.1%
73.492233691
< 0.1%
77.45958611
< 0.1%
81.710895271
< 0.1%
94.091307481
< 0.1%
94.812545221
< 0.1%
94.908977131
< 0.1%
97.28090861
< 0.1%
98.36791491
< 0.1%
98.452930511
< 0.1%
ValueCountFrequency (%)
323.1241
< 0.1%
317.33812411
< 0.1%
311.38395651
< 0.1%
308.25383291
< 0.1%
307.70602411
< 0.1%
306.62748141
< 0.1%
304.23591211
< 0.1%
303.70262671
< 0.1%
300.29247581
< 0.1%
298.09867951
< 0.1%

Solids
Real number (ℝ≥0)

UNIQUE

Distinct3276
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22014.09253
Minimum320.9426113
Maximum61227.19601
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size25.7 KiB
2021-10-24T09:12:42.263465image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum320.9426113
5-th percentile9545.812579
Q115666.6903
median20927.8336
Q327332.76213
95-th percentile38474.99025
Maximum61227.19601
Range60906.2534
Interquartile range (IQR)11666.07183

Descriptive statistics

Standard deviation8768.570828
Coefficient of variation (CV)0.3983162521
Kurtosis0.4428260858
Mean22014.09253
Median Absolute Deviation (MAD)5809.47186
Skewness0.6216344855
Sum72118167.12
Variance76887834.36
MonotonicityNot monotonic
2021-10-24T09:12:42.375489image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20791.318981
 
< 0.1%
15979.334791
 
< 0.1%
37000.955671
 
< 0.1%
18736.19091
 
< 0.1%
12289.900921
 
< 0.1%
15979.060271
 
< 0.1%
12431.803111
 
< 0.1%
30031.839181
 
< 0.1%
29532.6151
 
< 0.1%
19821.338371
 
< 0.1%
Other values (3266)3266
99.7%
ValueCountFrequency (%)
320.94261131
< 0.1%
728.75082961
< 0.1%
1198.9436991
< 0.1%
1351.9069791
< 0.1%
1372.0910431
< 0.1%
2552.9628041
< 0.1%
2808.0257561
< 0.1%
2835.3031651
< 0.1%
2912.2112471
< 0.1%
3413.0816331
< 0.1%
ValueCountFrequency (%)
61227.196011
< 0.1%
56867.859241
< 0.1%
56488.672411
< 0.1%
56351.39631
< 0.1%
56320.586981
< 0.1%
55334.70281
< 0.1%
53735.899191
< 0.1%
52318.91731
< 0.1%
52060.22681
< 0.1%
51731.820551
< 0.1%

Chloramines
Real number (ℝ≥0)

UNIQUE

Distinct3276
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.122276793
Minimum0.352
Maximum13.127
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size25.7 KiB
2021-10-24T09:12:42.602245image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0.352
5-th percentile4.50305371
Q16.127420755
median7.130298974
Q38.114887032
95-th percentile9.753100546
Maximum13.127
Range12.775
Interquartile range (IQR)1.987466276

Descriptive statistics

Standard deviation1.583084889
Coefficient of variation (CV)0.2222723063
Kurtosis0.5899011689
Mean7.122276793
Median Absolute Deviation (MAD)0.9916613425
Skewness-0.01209843999
Sum23332.57878
Variance2.506157766
MonotonicityNot monotonic
2021-10-24T09:12:42.689258image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.3002118731
 
< 0.1%
9.5043610271
 
< 0.1%
6.2172225421
 
< 0.1%
5.5998703421
 
< 0.1%
10.786499821
 
< 0.1%
7.4249445911
 
< 0.1%
6.66161621
 
< 0.1%
6.215307311
 
< 0.1%
7.9810368991
 
< 0.1%
6.3449634121
 
< 0.1%
Other values (3266)3266
99.7%
ValueCountFrequency (%)
0.3521
< 0.1%
0.5303512951
< 0.1%
1.3908709051
< 0.1%
1.6839925811
< 0.1%
1.9202714491
< 0.1%
2.1026909911
< 0.1%
2.3866534941
< 0.1%
2.397984991
< 0.1%
2.4560135961
< 0.1%
2.4586091951
< 0.1%
ValueCountFrequency (%)
13.1271
< 0.1%
13.043806111
< 0.1%
12.912186641
< 0.1%
12.653362021
< 0.1%
12.626899741
< 0.1%
12.580026491
< 0.1%
12.363284831
< 0.1%
12.279374181
< 0.1%
12.24639411
< 0.1%
12.227175281
< 0.1%

Sulfate
Real number (ℝ≥0)

MISSING

Distinct2495
Distinct (%)100.0%
Missing781
Missing (%)23.8%
Infinite0
Infinite (%)0.0%
Mean333.7757766
Minimum129
Maximum481.0306423
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size25.7 KiB
2021-10-24T09:12:42.766160image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum129
5-th percentile266.6162317
Q1307.6994978
median333.0735457
Q3359.9501704
95-th percentile403.0701898
Maximum481.0306423
Range352.0306423
Interquartile range (IQR)52.25067255

Descriptive statistics

Standard deviation41.41684046
Coefficient of variation (CV)0.1240858186
Kurtosis0.6482628151
Mean333.7757766
Median Absolute Deviation (MAD)26.0951759
Skewness-0.03594662163
Sum832770.5626
Variance1715.354674
MonotonicityNot monotonic
2021-10-24T09:12:42.839169image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
280.74562291
 
< 0.1%
332.74451921
 
< 0.1%
391.91822861
 
< 0.1%
330.90537041
 
< 0.1%
402.31342711
 
< 0.1%
360.69781511
 
< 0.1%
336.04045181
 
< 0.1%
405.52733721
 
< 0.1%
346.06367681
 
< 0.1%
368.51644131
 
< 0.1%
Other values (2485)2485
75.9%
(Missing)781
 
23.8%
ValueCountFrequency (%)
1291
< 0.1%
180.20674641
< 0.1%
182.39737021
< 0.1%
187.17071441
< 0.1%
187.42413091
< 0.1%
192.03359171
< 0.1%
203.44452081
< 0.1%
205.93509061
< 0.1%
206.24722941
< 0.1%
207.89048231
< 0.1%
ValueCountFrequency (%)
481.03064231
< 0.1%
476.53971731
< 0.1%
475.73746021
< 0.1%
462.4742151
< 0.1%
460.1070691
< 0.1%
458.44107231
< 0.1%
455.45123371
< 0.1%
450.91445441
< 0.1%
449.26768751
< 0.1%
447.41796241
< 0.1%

Conductivity
Real number (ℝ≥0)

UNIQUE

Distinct3276
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean426.2051107
Minimum181.483754
Maximum753.3426196
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size25.7 KiB
2021-10-24T09:12:42.949320image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum181.483754
5-th percentile300.1094657
Q1365.7344142
median421.8849683
Q3481.7923045
95-th percentile566.3493199
Maximum753.3426196
Range571.8588656
Interquartile range (IQR)116.0578904

Descriptive statistics

Standard deviation80.82406405
Coefficient of variation (CV)0.1896365436
Kurtosis-0.2770928328
Mean426.2051107
Median Absolute Deviation (MAD)57.8875912
Skewness0.2644902239
Sum1396247.943
Variance6532.52933
MonotonicityNot monotonic
2021-10-24T09:12:43.026286image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
564.30865421
 
< 0.1%
418.64206281
 
< 0.1%
517.57676191
 
< 0.1%
235.04228351
 
< 0.1%
501.55972521
 
< 0.1%
452.18723261
 
< 0.1%
367.85402481
 
< 0.1%
400.61189911
 
< 0.1%
469.13211691
 
< 0.1%
482.59570931
 
< 0.1%
Other values (3266)3266
99.7%
ValueCountFrequency (%)
181.4837541
< 0.1%
201.61973681
< 0.1%
210.3191821
< 0.1%
217.35832961
< 0.1%
232.6136241
< 0.1%
233.90796511
< 0.1%
235.04228351
< 0.1%
245.8596321
< 0.1%
247.91803051
< 0.1%
251.02089871
< 0.1%
ValueCountFrequency (%)
753.34261961
< 0.1%
708.22636451
< 0.1%
695.3695281
< 0.1%
674.44347591
< 0.1%
672.55699921
< 0.1%
669.72508621
< 0.1%
666.69061831
< 0.1%
660.25494631
< 0.1%
657.57042181
< 0.1%
656.92412781
< 0.1%

Organic_carbon
Real number (ℝ≥0)

UNIQUE

Distinct3276
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.28497025
Minimum2.2
Maximum28.3
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size25.7 KiB
2021-10-24T09:12:43.113268image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum2.2
5-th percentile8.815361702
Q112.06580133
median14.21833794
Q316.55765154
95-th percentile19.63725444
Maximum28.3
Range26.1
Interquartile range (IQR)4.491850208

Descriptive statistics

Standard deviation3.308161999
Coefficient of variation (CV)0.2315834014
Kurtosis0.04440930721
Mean14.28497025
Median Absolute Deviation (MAD)2.23229412
Skewness0.02553258198
Sum46797.56253
Variance10.94393581
MonotonicityNot monotonic
2021-10-24T09:12:43.202244image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.379783081
 
< 0.1%
12.897635451
 
< 0.1%
15.871769791
 
< 0.1%
11.5454771
 
< 0.1%
12.284333521
 
< 0.1%
18.584959371
 
< 0.1%
21.300646941
 
< 0.1%
15.288781631
 
< 0.1%
16.16921171
 
< 0.1%
12.164735681
 
< 0.1%
Other values (3266)3266
99.7%
ValueCountFrequency (%)
2.21
< 0.1%
4.3718986081
< 0.1%
4.4667719691
< 0.1%
4.4730922641
< 0.1%
4.8616314981
< 0.1%
4.9028880681
< 0.1%
4.9668616191
< 0.1%
5.0516946151
< 0.1%
5.1593803081
< 0.1%
5.1884664551
< 0.1%
ValueCountFrequency (%)
28.31
< 0.1%
27.006706611
< 0.1%
24.755392371
< 0.1%
23.952450441
< 0.1%
23.917601261
< 0.1%
23.667666781
< 0.1%
23.604297971
< 0.1%
23.569644911
< 0.1%
23.514773771
< 0.1%
23.399516061
< 0.1%

Trihalomethanes
Real number (ℝ≥0)

MISSING

Distinct3114
Distinct (%)100.0%
Missing162
Missing (%)4.9%
Infinite0
Infinite (%)0.0%
Mean66.39629295
Minimum0.738
Maximum124
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size25.7 KiB
2021-10-24T09:12:43.291372image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0.738
5-th percentile39.55292835
Q155.84453562
median66.6224851
Q377.3374729
95-th percentile92.12405947
Maximum124
Range123.262
Interquartile range (IQR)21.49293728

Descriptive statistics

Standard deviation16.17500842
Coefficient of variation (CV)0.2436131251
Kurtosis0.2385974401
Mean66.39629295
Median Absolute Deviation (MAD)10.74217213
Skewness-0.0830306741
Sum206758.0562
Variance261.6308975
MonotonicityNot monotonic
2021-10-24T09:12:43.398156image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
86.990970461
 
< 0.1%
56.715509551
 
< 0.1%
77.730814371
 
< 0.1%
90.394894721
 
< 0.1%
37.787096641
 
< 0.1%
78.92552711
 
< 0.1%
89.477718371
 
< 0.1%
69.5267181
 
< 0.1%
72.573959381
 
< 0.1%
57.780869321
 
< 0.1%
Other values (3104)3104
94.7%
(Missing)162
 
4.9%
ValueCountFrequency (%)
0.7381
< 0.1%
8.1758763841
< 0.1%
8.5770129331
< 0.1%
14.343161451
< 0.1%
15.68487681
< 0.1%
16.29150461
< 0.1%
17.000682931
< 0.1%
17.527764961
< 0.1%
17.915722571
< 0.1%
18.015272361
< 0.1%
ValueCountFrequency (%)
1241
< 0.1%
120.0300771
< 0.1%
118.35727471
< 0.1%
116.16162161
< 0.1%
114.20867141
< 0.1%
114.03494571
< 0.1%
113.04888571
< 0.1%
112.6227331
< 0.1%
112.41221041
< 0.1%
112.06102741
< 0.1%

Turbidity
Real number (ℝ≥0)

UNIQUE

Distinct3276
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.96678617
Minimum1.45
Maximum6.739
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size25.7 KiB
2021-10-24T09:12:43.481952image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1.45
5-th percentile2.684279234
Q13.43971087
median3.955027563
Q34.500319787
95-th percentile5.220924525
Maximum6.739
Range5.289
Interquartile range (IQR)1.060608917

Descriptive statistics

Standard deviation0.7803824085
Coefficient of variation (CV)0.1967291341
Kurtosis-0.06280064054
Mean3.96678617
Median Absolute Deviation (MAD)0.5302962355
Skewness-0.007816642358
Sum12995.19149
Variance0.6089967035
MonotonicityNot monotonic
2021-10-24T09:12:43.571233image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2.9631353811
 
< 0.1%
3.9870120911
 
< 0.1%
4.0662293641
 
< 0.1%
3.7593262011
 
< 0.1%
4.8762731
 
< 0.1%
5.1437501221
 
< 0.1%
4.5132005391
 
< 0.1%
4.204185851
 
< 0.1%
4.5867483591
 
< 0.1%
4.9109110211
 
< 0.1%
Other values (3266)3266
99.7%
ValueCountFrequency (%)
1.451
< 0.1%
1.4922066151
< 0.1%
1.4961009431
< 0.1%
1.641515011
< 0.1%
1.6597993851
< 0.1%
1.6805540251
< 0.1%
1.6876245051
< 0.1%
1.8013269991
< 0.1%
1.812528941
< 0.1%
1.8443716041
< 0.1%
ValueCountFrequency (%)
6.7391
< 0.1%
6.4947485561
< 0.1%
6.4942494671
< 0.1%
6.3891610091
< 0.1%
6.357438521
< 0.1%
6.3076784721
< 0.1%
6.2265804051
< 0.1%
6.2048463591
< 0.1%
6.0996318731
< 0.1%
6.0837723541
< 0.1%

Potability
Categorical

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.7 KiB
0
1998 
1
1278 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
01998
61.0%
11278
39.0%

Length

2021-10-24T09:12:43.641550image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-24T09:12:43.678549image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
01998
61.0%
11278
39.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2021-10-24T09:12:40.849983image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:33.952088image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:35.138031image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:36.050213image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:37.338432image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:38.295335image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:38.932116image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:39.557082image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:40.226119image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:40.913073image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:34.149444image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:35.221734image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:36.155711image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:37.509156image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:38.388392image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:39.006190image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:39.627101image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:40.297137image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:40.975076image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:34.293539image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:35.300299image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:36.291938image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:37.756377image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:38.475904image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:39.072114image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:39.689081image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:40.360120image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:41.041074image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:34.442521image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:35.387115image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:36.452471image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:37.835073image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:38.541904image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:39.145196image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:39.751082image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:40.431730image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:41.109072image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:34.629781image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:35.537314image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:36.577410image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:37.906351image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:38.606903image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:39.210113image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:39.816081image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:40.498931image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:41.177573image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:34.729231image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:35.671302image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:36.725255image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:37.978501image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:38.670902image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:39.280200image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:39.886120image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:40.573559image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:41.245580image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:34.852723image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:35.783775image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:36.898567image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:38.050759image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:38.737902image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:39.346192image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:40.041120image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:40.641899image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:41.305570image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:34.971583image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:35.879591image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:37.043615image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:38.120020image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:38.801903image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:39.412189image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:40.101120image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:40.713009image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:41.378603image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:35.058204image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:35.965191image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:37.199080image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:38.211082image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:38.868112image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:39.495082image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:40.166140image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-24T09:12:40.783979image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2021-10-24T09:12:43.720169image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-10-24T09:12:43.840436image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-10-24T09:12:43.972686image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-10-24T09:12:44.118907image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-10-24T09:12:41.508426image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-10-24T09:12:41.631405image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-10-24T09:12:41.721414image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-10-24T09:12:41.771408image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

phHardnessSolidsChloraminesSulfateConductivityOrganic_carbonTrihalomethanesTurbidityPotability
0NaN204.89045620791.318987.300212368.516441564.30865410.37978386.9909702.9631350
13.716080129.42292118630.057866.635246NaN592.88535915.18001356.3290764.5006560
28.099124224.23625919909.541739.275884NaN418.60621316.86863766.4200933.0559340
38.316766214.37339422018.417448.059332356.886136363.26651618.436525100.3416744.6287710
49.092223181.10150917978.986346.546600310.135738398.41081311.55827931.9979934.0750750
55.584087188.31332428748.687747.544869326.678363280.4679168.39973554.9178622.5597080
610.223862248.07173528749.716547.513408393.663395283.65163413.78969584.6035562.6729890
78.635849203.36152313672.091764.563009303.309771474.60764512.36381762.7983094.4014250
8NaN118.98857914285.583857.804174268.646941389.37556612.70604953.9288463.5950170
911.180284227.23146925484.508499.077200404.041635563.88548117.92780671.9766014.3705620

Last rows

phHardnessSolidsChloraminesSulfateConductivityOrganic_carbonTrihalomethanesTurbidityPotability
32668.372910169.08705214622.745497.547984NaN464.52555211.08302738.4351514.9063581
32678.989900215.04735815921.412026.297312312.931021390.4102319.89911555.0693044.6138431
32686.702547207.32108617246.920357.708117304.510230329.26600216.21730328.8786013.4429831
326911.49101194.81254537188.826029.263166258.930600439.89361816.17275541.5585014.3692641
32706.069616186.65904026138.780197.747547345.700257415.88695512.06762060.4199213.6697121
32714.668102193.68173647580.991607.166639359.948574526.42417113.89441966.6876954.4358211
32727.808856193.55321217329.802168.061362NaN392.44958019.903225NaN2.7982431
32739.419510175.76264633155.578227.350233NaN432.04478311.03907069.8454003.2988751
32745.126763230.60375811983.869386.303357NaN402.88311311.16894677.4882134.7086581
32757.874671195.10229917404.177067.509306NaN327.45976116.14036878.6984462.3091491